Apache Hive

Apache Hive Data Warehouse framework facilitates the querying and management of large datasets residing in a distributed store/file system like Hadoop Distributed File System (HDFS). The following are a few highlights of this project:

Hive offers a technique to map a tabular structure on to data stored in distributed storage.
Hive supports most of the data types available in many popular relational database platforms.
Hive has various built-in functions, types, etc. for handling many commonly performed operations.
Hive allows querying of the data from distributed storage through the mapped tabular structure.
Hive offers various features, which are similar to relational databases, like partitioning, indexing, external tables, etc.
Hive manages its internal data (system catalog) like metadata about Hive Tables, Partitioning information, etc. in a separate database known as Hive Metastore.
Hive queries are written in a SQL-like language known as HiveQL.
Hive also allows plugging in custom mappers, custom reducers, custom user-defined functions, etc. to perform more sophisticated operations.
HiveQL queries are executed via MapReduce. Meaning, when a HiveQL query is issued, it triggers a Map and/or Reduce job(s) to perform the operation defined in the query.

Additional Information: Home Page | Wiki | Documentation/User Guide/Reference Manual | Mailing Lists

Apache Hive

Apache Hive

results matching ""

No results matching ""